Miyagi Prefecture
SoftMatcha 2: A Fast and Soft Pattern Matcher for Trillion-Scale Corpora
Yoneda, Masataka, Matsushita, Yusuke, Kamoda, Go, Suenaga, Kohei, Akiba, Takuya, Waga, Masaki, Yokoi, Sho
We present an ultra-fast and flexible search algorithm that enables search over trillion-scale natural language corpora in under 0.3 seconds while handling semantic variations (substitution, insertion, and deletion). Our approach employs string matching based on suffix arrays that scales well with corpus size. To mitigate the combinatorial explosion induced by the semantic relaxation of queries, our method is built on two key algorithmic ideas: fast exact lookup enabled by a disk-aware design, and dynamic corpus-aware pruning. We theoretically show that the proposed method suppresses exponential growth in the search space with respect to query length by leveraging statistical properties of natural language. In experiments on FineWeb-Edu (Lozhkov et al., 2024) (1.4T tokens), we show that our method achieves significantly lower search latency than existing methods: infini-gram (Liu et al., 2024), infini-gram mini (Xu et al., 2025), and SoftMatcha (Deguchi et al., 2025). As a practical application, we demonstrate that our method identifies benchmark contamination in training corpora, unidentified by existing approaches. We also provide an online demo of fast, soft search across corpora in seven languages.
PARD: Permutation-invariantAutoregressiveDiffusion forGraphGeneration
Specifically, we show that contrary to sets, elements in a graph are not entirely unordered and there is a unique partial order for nodes and edges. With this partial order,PARD generates a graph in a block-by-block, autoregressivefashion, where each block'sprobability isconditionally modeled by a shared diffusion model with an equivariant network.
Data-Driven Global Sensitivity Analysis for Engineering Design Based on Individual Conditional Expectations
Palar, Pramudita Satria, Saves, Paul, Regis, Rommel G., Shimoyama, Koji, Obayashi, Shigeru, Verstaevel, Nicolas, Morlier, Joseph
Explainable machine learning techniques have gained increasing attention in engineering applications, especially in aerospace design and analysis, where understanding how input variables influence data-driven models is essential. Partial Dependence Plots (PDPs) are widely used for interpreting black-box models by showing the average effect of an input variable on the prediction. However, their global sensitivity metric can be misleading when strong interactions are present, as averaging tends to obscure interaction effects. To address this limitation, we propose a global sensitivity metric based on Individual Conditional Expectation (ICE) curves. The method computes the expected feature importance across ICE curves, along with their standard deviation, to more effectively capture the influence of interactions. We provide a mathematical proof demonstrating that the PDP-based sensitivity is a lower bound of the proposed ICE-based metric under truncated orthogonal polynomial expansion. In addition, we introduce an ICE-based correlation value to quantify how interactions modify the relationship between inputs and the output. Comparative evaluations were performed on three cases: a 5-variable analytical function, a 5-variable wind-turbine fatigue problem, and a 9-variable airfoil aerodynamics case, where ICE-based sensitivity was benchmarked against PDP, SHapley Additive exPlanations (SHAP), and Sobol' indices. The results show that ICE-based feature importance provides richer insights than the traditional PDP-based approach, while visual interpretations from PDP, ICE, and SHAP complement one another by offering multiple perspectives.
Magnitude 6.7 quake off Aomori triggers tsunami advisory
Magnitude 6.7 quake off Aomori triggers tsunami advisory Areas under a tsunami advisory are shown in yellow following a magnitude 6.7 earthquake on Friday | JAPAN METEOROLOGICAL AGENCY A magnitude 6.7 earthquake triggered a tsunami advisory for parts of Hokkaido as well as the coasts of Aomori, Iwate and Miyagi prefectures on Friday. The quake struck at 11:44 a.m., registering 4 on Japan's seismic intensity scale in some areas. Waves of up to 1 meter are possible in areas under the advisory, according to the Japan Meteorological Agency (JMA). A tsunami advisory, a level lower than a tsunami warning, urges those in the area to stay away from the ocean. Evacuation is not required under an advisory.
Storage capacity of perceptron with variable selection
Xu, Yingying, Ohzeki, Masayuki, Kabashima, Yoshiyuki
A central challenge in machine learning is to distinguish genuine structure from chance correlations in high-dimensional data. In this work, we address this issue for the perceptron, a foundational model of neural computation. Specifically, we investigate the relationship between the pattern load $α$ and the variable selection ratio $ρ$ for which a simple perceptron can perfectly classify $P = αN$ random patterns by optimally selecting $M = ρN$ variables out of $N$ variables. While the Cover--Gardner theory establishes that a random subset of $ρN$ dimensions can separate $αN$ random patterns if and only if $α< 2ρ$, we demonstrate that optimal variable selection can surpass this bound by developing a method, based on the replica method from statistical mechanics, for enumerating the combinations of variables that enable perfect pattern classification. This not only provides a quantitative criterion for distinguishing true structure in the data from spurious regularities, but also yields the storage capacity of associative memory models with sparse asymmetric couplings.
Japan town retracts bear sighting warning sparked by AI image
A bear warning sign is displayed in Shirakawa-go, a popular tourist spot in Gifu Prefecture. A town in Miyagi Prefecture has retracted its social media post warning of a bear sighting after discovering an image submitted to it had been generated using artificial intelligence. A Japanese town has deleted a social media post warning of a bear sighting after discovering that a picture it had received showing the fearsome creature was generated using artificial intelligence. Similar fake images have been circulating online as fear of bears runs high in the country, where the animals have killed a record 13 people this year. "The town prioritized informing residents to avoid danger, but we apologize for causing any anxiety or confusion," the town of Onagawa, Miyagi Prefecture, said on its official X social media account on Wednesday.
Sub-exponential Growth of New Words and Names Online: A Piecewise Power-Law Model
The diffusion of ideas and language in society has conventionally been described by S-shaped models, such as the logistic curve. However, the role of sub-exponential growth -- a slower-than-exponential pattern known in epidemiology -- has been largely overlooked in broader social phenomena. Here, we present a piecewise power-law model to characterize complex growth curves with a few parameters. We systematically analyzed a large-scale dataset of approximately one billion Japanese blog articles linked to Wikipedia vocabulary, and observed consistent patterns in web search trend data (English, Spanish, and Japanese). Our analysis of 2,963 items, selected for reliable estimation (e.g., sufficient duration/peak, monotonic growth), reveals that 1,625 (55%) diffusion patterns without abrupt level shifts were adequately described by one or two segments. For single-segment curves, we found that (i) the mode of the shape parameter $α$ was near 0.5, indicating prevalent sub-exponential growth; (ii) the peak diffusion scale is primarily determined by the growth rate $R$, with minor contributions from $α$ or the duration $T$; and (iii) $α$ showed a tendency to vary with the nature of the topic, being smaller for niche/local topics and larger for widely shared ones. Furthermore, a micro-behavioral model of outward (stranger) vs. inward (community) contact suggests that $α$ can be interpreted as an index of the preference for outward-oriented communication. These findings suggest that sub-exponential growth is a common pattern of social diffusion, and our model provides a practical framework for consistently describing, comparing, and interpreting complex and diverse growth curves.
Portuguese Man O'War species honors 'One-Eyed Dragon' samurai
The newly discovered P. mikazuki is a tribute the famous warrior Date Masamune. Breakthroughs, discoveries, and DIY tips sent every weekday. A team of university students in Japan identified an entirely new species of the mighty Portuguese Man O'War . Described in a study recently published in the journal, the creature's distinct features and fearsome venom have earned it a name that honors a famous 16th century samurai warrior. It's easy to mistake the Portuguese Man O'War () for a jellyfish .
Yoshihiro Murai clinches sixth term as Miyagi governor
Yoshihiro Murai, 65, celebrates his victory in the Miyagi gubernatorial election on Sunday night. SENDAI - Yoshihiro Murai held off four other candidates to clinch his sixth term as governor of Miyagi Prefecture in Sunday's gubernatorial election. Murai, an independent candidate who had support from prefectural assembly members of the Liberal Democratic Party, Japan Innovation Party and Komeito, highlighted his achievements as the prefecture's governor spanning five terms, or 20 years. The 65-year-old former chief of the National Governors' Association pledged to enhance productivity by promoting digital transformation using generative artificial intelligence, in anticipation of a further population decline. He successfully fended off Masamune Wada, 51, also an independent candidate, who had been closing in.